Trivial Persuits

Comments 0

Share to social media

Once again I find myself penning a missive on trivia, rather than matters of import. Still, now we’re here…

Regular expressions. To a large extent, I hate them. To begin with, they are a complete misnominer. They are neither regular, nor expressions. Don’t believe me? To demonstrate this I’ll consult an authority on such topics. The Oxford English Dictionary would be handy, but being incredibly lax I don’t seem to have a copy lying around. However, dictionary.com will do…

reg·u·lar
adj.

   1. Customary, usual, or normal: the train’s regular schedule.
   2. Orderly, even, or symmetrical: regular teeth.
   3. In conformity with a fixed procedure, principle, or discipline.
   4. Well-ordered; methodical: regular habits.
   5. Occurring at fixed intervals; periodic: regular payments.
   6.
         1. Occurring with normal or healthy frequency.
         2. Having bowel movements or menstrual periods with normal or healthy frequency.
   7. Not varying; constant.
   8. Formally correct; proper.
   9. Having the required qualifications for an occupation: not a regular lawyer.
  10. Informal. Complete; thorough: a regular scoundrel.
  11. Informal. Good; nice: a regular guy.
  12. Botany. Having symmetrically arranged parts of similar size and shape: regular flowers.
  13. Grammar. Conforming to the usual pattern of inflection, derivation, or word formation.
  14. Ecclesiastical. Belonging to a religious order and bound by its rules: the regular clergy.
  15. Mathematics.
         1. Having equal sides and equal angles. Used of polygons.
         2. Having faces that are congruent regular polygons and congruent polyhedral angles. Used of polyhedrons.
  16. Belonging to or constituting the permanent army of a nation.

ex·pres·sion
n.

   1. The act of expressing, conveying, or representing in words, art, music, or movement; a manifestation: an expression of rural values.
   2. Something that expresses or communicates: Let this plaque serve as an expression of our esteem.
   3. Mathematics. A symbol or combination of symbols that represents a quantity or a relationship between quantities.
   4. The manner in which one expresses oneself, especially in speaking, depicting, or performing.
   5. A particular word or phrase: “an old Yankee expression… ‘Stand up and be counted'” (Charles Kuralt).
   6. The outward manifestation of a mood or a disposition: My tears are an expression of my grief.
   7. A facial aspect or a look that conveys a special feeling: an expression of scorn.
   8. The act of pressing or squeezing out.
   9. Genetics. The act or process of expressing a gene.
Now, I know someone’s going to tell me that if you look up “regular expression” on dictionary.com it gives you the “proper” definition. But to that I say, bah. There’s far too much rather pants terminology in the world, particularly in the computing arena. Next you’ll be telling me that “non-deterministic finite state automaton” is easier to say than “state machine”. And put those 5-tuples away, you don’t know where they’ve been. It’s not big and it’s not clever.

Without getting overly semantic (though I may deviate from the canonical definition of “overly” here) regular expressions are not regular. There are at least three different flavours of which I’m aware, all of which have their own magical incantations: Perl-compatible (as seen in Perl – shocking I know – PHP, and the open source PCRE engine, http://pcre.org); Microsoft (as seen in Visual Studio and .NET’s regular expression library); and Java (as seen in, well, Java). “Expressions” expressed in one dialect are not necessarily freely transferable to the other. This doesn’t meet any sensible definition of regular that I’m aware of. “Self consistent” would perhaps be more appropriate.

Nor are they expressions. Expressions tend, by definition, to express something to somebody. I blame Perl entirely for this. Any “high level” language in which software can be expressed entirely without reference to any alphabet used by a civilisation, past or present, cannot claim to adequately express anything. If I wanted unreadable gumf, I’d write software in pure machine code. With a decent lookup table and a few cups of coffee it wouldn’t be much less straightforward.

However, despite their inadequate definition and manifest unreadability, they do have some extraordinary merits.

Today, for example, 10 minutes of faffing around trying to remember which bits of the Visual Studio RE (see, even handy, descriptive phrases like “regular expression” are too long to type occasionally) syntax are Microsoft-specific, and which are generally applicable, in which time I eventually resorted to the help…proved to be exceedingly handy.

Our software tends to be sold in many international markets, and thus internationalisation (I18N; which isn’t a regular expression, but another bludgeoningly stupid acronym bordering on haxx0r lingo) and localisation (L10N, if I recall…but please don’t tell me if I’m wrong, I’d like to keep it as a surprise) is an issue for us. All the strings in our source code need to be pulled out of resources, rather than hard-coded. This is sensible practice anyway, but means you end up not being able to express text literally but dereferencing it.

Previously I was doing as follows. Rather than expressing, eg.

namespace MyCompany.MyApp
{
    class HelloWorld()
    {
        HelloWorld()
        {
            string name = “world”;
            string message = string.Format(“Hello, {0}”,name);
            Debug.WriteLine(message);
        }
    }
}

One might write:

namespace MyCompany.MyApp
{
    class HelloWorld
    {
        private static ResourceManager m_Resources = new ResourceManager(“MyCompany.MyApp.HelloWorldClass_Resources”);

        HelloWorld()
        {     
            string name = m_Resources.GetString(“Name.Text”);
            string message = string.Format( m_Resources.GetString(“HelloWorldMessage.Text”), name );
            Debug.WriteLine(message);
        }
     }
}

Which is not overlong, but is a bit less expressive. Note how I decided at some point to adopt a “dotted text” convention for my resource string names at some point. Not a particularly clever idea, in retrospect, since most of them end in “.Text”, except when I decide they should end in “.Caption” as they represent eg. a button caption on a form. Of course then you discover that the button’s property is also called Text rather than Caption, but there’d be retyping involved in changing it…and so on.

Nowadays I’ve adopted a rather simpler syntax. Assuming the existence of a handy UI class with some helpful static methods, I tend to write as follows:

namespace MyCompany.MyApp
{
    class HelloWorld
    {
        private static ResourceStrings S = UI.GetResourceStrings(typeof(HelloWorld));

        HelloWorld()
        {     
            Debug.WriteLine( string.Format( S[“HelloX”], S[“World”] ) );
        }
     }
}

Which I think is remarkably tidier. Probably because it reminds me of the Visual C++ _T() macro to encapsulate text which, at compile time, may or may not turn out to be Unicode.

Anyhoo. The point was, that having adopted this new convention I was faced with having to change a number of source files to use the new style rather than the old.

Conventional search-and-replace is of little value here. Replacing m_Resources.GetString(“SomeText”) with S[“SomeText”] can’t quite be done. You could try it in two parts, replacing m_Resources.GetString(” with S[“, and replacing “) with “], but you’re bound to find a number of “)’s littered all over the shop which you’ll have to fix manually, that belong to other things than a GetString() call.

With regular expressions, however, you can make the hougan work his vodun for you. Since Visual Studio’s Find/Replace widget supports regular expressions, one can incant the following:

Find all   m_Resources.GetString({[^)]#})
And replace with   S[(0,1)]

Once the sparks have settled down and the broom has finished sweeping the magician’s workshop, one finds cheerfully that tedium has been averted and it’s time for lunch.

For those who are relentlessly interested in regular expressions, I’ve also just found the aptly named http://www.regular-expressions.info/, which does exactly what it says on the tin, if URL’s were to come in tins. URL. Don’t get me started on that little moniker…

Load comments

About the author

Dan Archer

See Profile

Dan Archer is a Software Engineer at Red Gate and has worked on tools ranging from SQL Backup to the forthcoming SQL Response.

Dan Archer's contributions